drawing

Understanding Childhood Insomnia: Exploring Contributing Factors and Strategies for Resolution

Author: Jose G. Chavez

Introduction

Childhood Insomnia is a condition that holds immense significance due to its potential impact on the physical, mental, and emotional well-being of adolescents. As adolescents navigate the critical phase of development, the quality of their sleep plays a pivotal role in shaping their overall health, cognitive function, and emotional resilience. Understanding the factors that contribute to childhood insomnia is not only academically intriguing but also practically essential for devising interventions that promote healthier sleep patterns in this age group.

In this article, we delve into an exploratory analysis of the Adolescent Insomnia Study dataset. This dataset encapsulates a wealth of information regarding demographic, psychometric, clinical, and item-level data collected from a study focused on insomnia in adolescents. By embarking on this analysis, we aim to uncover hidden connections between various psychometric traits, coping mechanisms, and sleep quality.

This article is the first part of a series of articles that will explore the dataset and its potential implications/directions for further study.

I am not a clinician, I am a mathematics PhD with an interest data and in child development and education.

The goal of this series in totality, is to explore the data to uncover interesting potential facts about the connections between various psychometric traits and/or coping skills present in adolescents.

We will be creating multiple notebooks that both explore various aspects of the data and demonstrate how to create models (machine learning and statistical) of various sorts (whatever is most appropriate to the data).

The author found the dataset on Kaggle: https://www.kaggle.com/datasets/utkarshx27/insomnia-symptomatology-in-adolescence

The analysis is divided into several steps, each of which is described in detail in the following sections.

import matplotlib.pyplot as plt
import matplotlib
from contextlib import redirect_stdout
import io
import seaborn as sns

available_styles = plt.style.available

# Create a dummy file-like object to capture output
dummy_output = io.StringIO()
'''
# Use the context manager to redirect output to the dummy file-like object
with redirect_stdout(dummy_output):
    for style_name in available_styles:
        print(f"{style_name}: plt.style.use('{style_name}')")

    plt.style.use('seaborn-darkgrid')  # Replace 'seaborn' with your chosen style
'''

plt.rcParams['figure.figsize'] = [3, 2]
sns.set(rc={'figure.figsize':(3,2)})

Contact me:

If you have any questions, would like to collaborate and/or have any relevant data you want to share please say hello at jgcblue9558@gmail.com!

Libraries We Need and Importing Data

We will be importing:

  • pandas: for it's dataframe data tructures and the many tools that ship with it;
  • sklearn: a library that has many "models" just waiting to be instantiated and filled with your parameters;
  • seaborn: a nice alternative (built on top of to be precise) to the matplotlib library for data visualizations. It has some more modern visual tools and is geared slightly more towards Statistics than matplotlib.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
#importing the model
from sklearn.linear_model import LogisticRegression



#importing tools for splitting the dataset for training and testing
from sklearn.model_selection import train_test_split
#importing tools for evaluating model performance
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns


# Load the data
insomnia_data = pd.read_csv('insomnia_data.csv')
insomnia_item_level_data = pd.read_csv('insomnia_item_level_data.csv')
insomnia_data_dictionary = pd.read_csv('insomnia_data_dictionary.csv')


#Even simpler names for these objects

ind=insomnia_data
ini=insomnia_item_level_data
indict=insomnia_data_dictionary




selected_rows = insomnia_data_dictionary['Columns'].iloc[[3,4,16,17,18]]  # Replace with your desired indices


#print(selected_rows)b
pd.set_option('display.max_columns', 10)  # Display up to 10 columns without truncation



#display(insomnia_data)

On the CSV Files:

You may or may not have familiarity with the way that data collection is typically done in Statistical studies. Here we see a common arrangement for such a "package".

  1. A csv file with data in it most typical form;
  2. A csv file with a dictionary for helping the reader understand what the features/items mean;
  3. A "item-level" csv file where items (referring to questions for a psychological metric questionnaire are provided above the standard headings)

We will work (throughout this notebook at least) on the first two. Our datasets have high number of columns so we really will be forced to work with the dictionary csv. Good practice!

Data Cleaning

I remark here that we are quite lucky in that the data set has been "engineered" quite well. It is clean without null values or missing information. So this part of the project will be trivial.

The Data Set .... Through Two Lenses

Since this is a notebook within a website dealing with both "Data Analysis" and "Data Science" I want to take a little bit of time to discuss what sorts of methodologies may be suited for and why.

Let's consider both domains:

Data Analysis

Recall that Data Analysis is quite related to Statistics in fact it might be considered a superset in some ways and subset in others (doesn't require that one conduct studies for instance). The data we have is not tiny but definitely not large.

We can do things like:

  • Create scatter plots
  • Calculate Summary Statistics such as: mean, standard deviation, mode, etc;

Pose Questions that could relate to actions/policy changes.

Data Science

Now when one uses the words Data Science one is usually referring to the training of models (programs) capable of ingesting new data and predicting categories and or numerical values (most often one wants to predict labels but strictly speaking the field is not limited to that).

One typically wants a lot of data. That is the main thing to remember when dealing with the question of whether or not you should consider machine learning however how big is a little flexible. Some models perform well with smallish data sets.

Why are there multiple csv files?

If this is your firs time looking at Statistics data then its probably your first time seeing this sort of arrangement. Essentially when working with such data sets it is often the case that for readability (and even coding reasons) one often wants to have short encoded titles for columns. Of course one can forget what these columns/features represent! The way around this issues is to include other associated documents (commonly as part of say an excel workbook). That is why we have a file insomnia_data_dictionary.csv.

As for the insomnia_item_level_data.csv, it is comprised of self-reported scores for the metric.

Let's take a look at its contents using the display method

display(insomnia_data_dictionary)
Columns Description
0 Group INSOMNIA= 1, CONTROL=0
1 SubGroup clean INSOMNIA= 2, subclinical INSOMNIA = 1, C...
2 Remote Remote data collection = 1, In person data col...
3 Sex MALE = 1, FEMALE = 0
4 Age Years
... ... ...
90 ders_goals DERS Difficulties Engaging in Goal-Directed Be...
91 ders_impulse DERS Impulse control difficulties (IMPULSE) (D...
92 ders_awareness DERS Lack of Emotional Awareness (AWARENESS) (...
93 ders_strategies DERS Limited Access to Emotion Regulation Stra...
94 ders_clarity DERS Lack of Emotional Clarity (CLARITY) (Diff...

95 rows × 2 columns

Now let's take a look at the insomnia data

display(insomnia_data)
ID Group SubGroup Remote Sex ... Zders_impulse Zders_awareness Zders_strategies Zders_clarity ZDERS_total
0 sub_001 0 0 0 0 ... -0.114565 1.083087 -0.656051 0.538016 0.575806
1 sub_002 0 0 0 0 ... -0.114565 1.083087 -0.656051 0.538016 0.153943
2 sub_003 0 0 0 1 ... -0.425527 0.271626 -0.656051 -0.260601 -0.619473
3 sub_004 0 0 0 0 ... -0.114565 0.596210 -0.116442 0.538016 0.224254
4 sub_005 1 2 0 1 ... 0.196397 0.109334 0.153363 1.336632 0.857049
... ... ... ... ... ... ... ... ... ... ... ...
90 sub_091 1 1 1 1 ... -0.736489 0.109334 -0.925856 0.937324 0.294564
91 sub_092 1 2 1 0 ... 0.818321 -0.702127 -0.656051 1.735941 0.013322
92 sub_093 1 2 1 1 ... -0.736489 0.920795 0.153363 2.135249 1.208601
93 sub_094 1 2 1 0 ... -1.358413 -2.162757 -1.735270 -2.656452 -2.939721
94 sub_095 1 1 1 0 ... 2.373131 -1.675881 2.581605 -1.059218 0.997670

95 rows × 174 columns

As you can see the dictionary csv file explains the labels in the insomnia csv. This is common practice when labels are too long to include in the dataframes' labels proper.

Exploring the Dataset

First let's taking a look at the dataset using Panda's head function which allows us to see a manageable amount of the data.

print(insomnia_data.head(1))
        ID  Group  SubGroup  Remote  Sex  ...  Zders_impulse  Zders_awareness  \
0  sub_001      0         0       0    0  ...      -0.114565         1.083087   

   Zders_strategies  Zders_clarity  ZDERS_total  
0         -0.656051       0.538016     0.575806  

[1 rows x 174 columns]

Investigating What the Labels Mean

Recall that we can use the insomnia_data_dictionary to understand what the labels mean.

# Display the first few rows of each dataframe to understand the structure of the data
print('Insomnia Data:')
display(insomnia_data.head())
print('\nInsomnia Item Level Data:')
display(insomnia_item_level_data.head())
print('\nInsomnia Data Dictionary:')
display(insomnia_data_dictionary.head())
Insomnia Data:
ID Group SubGroup Remote Sex ... Zders_impulse Zders_awareness Zders_strategies Zders_clarity ZDERS_total
0 sub_001 0 0 0 0 ... -0.114565 1.083087 -0.656051 0.538016 0.575806
1 sub_002 0 0 0 0 ... -0.114565 1.083087 -0.656051 0.538016 0.153943
2 sub_003 0 0 0 1 ... -0.425527 0.271626 -0.656051 -0.260601 -0.619473
3 sub_004 0 0 0 0 ... -0.114565 0.596210 -0.116442 0.538016 0.224254
4 sub_005 1 2 0 1 ... 0.196397 0.109334 0.153363 1.336632 0.857049

5 rows × 174 columns

Insomnia Item Level Data:
Unnamed: 0 Pittsburgh Sleep Quality Index (PSQI) Pittsburgh Sleep Quality Index (PSQI).1 Pittsburgh Sleep Quality Index (PSQI).2 Pittsburgh Sleep Quality Index (PSQI).3 ... Race, Ethnicity & Sex.5 Race, Ethnicity & Sex.6 Race, Ethnicity & Sex.7 Race, Ethnicity & Sex.8 Race, Ethnicity & Sex.9
0 NaN During the past month, what time have you usua... During the past month, how long (in minutes) h... During the past month, what time have you usua... How much time (in minutes) do you usually spen... ... Race (choice=<strong>Unknown / Not Reported</s... Ethnicity (choice=<strong>Hispanic or Latino</... Ethnicity (choice=<strong>NOT Hispanic or Lati... Ethnicity (choice=<strong>Unknown / Not Report... Gender (0=Female, 1=Male)
1 Record ID psqi1 psqi2 psqi3 psqi_4a ... race___5 ethnicity___0 ethnicity___1 ethnicity___2 Sex
2 sub_001 2200 10 700 15 ... 0 0 1 0 0
3 sub_002 2200 10 500 10 ... 0 0 1 0 0
4 sub_003 2300 45 730 0 ... 0 0 1 0 1

5 rows × 471 columns

Insomnia Data Dictionary:
Columns Description
0 Group INSOMNIA= 1, CONTROL=0
1 SubGroup clean INSOMNIA= 2, subclinical INSOMNIA = 1, C...
2 Remote Remote data collection = 1, In person data col...
3 Sex MALE = 1, FEMALE = 0
4 Age Years

Brief Aside on Z-scores:

Z-scores, also known as standard scores, are a statistical measure that quantifies how far a particular data point is from the mean of a dataset when measured in terms of standard deviations. They are used to standardize data and allow comparisons between data points that may have different units or scales. Z-scores are calculated using the formula:

$$ Z = \frac{x - \mu}{\sigma} $$

Where:

  • $ Z $ is the z-score.
  • $ x $ is the individual data point.
  • $ \mu $ is the mean of the dataset.
  • $\sigma $ is the standard deviation of the dataset.

Here's what z-scores mean:

  1. Significance and Direction: The sign of the z-score (+ or -) indicates whether the data point is above or below the mean, respectively. Positive z-scores indicate data points above the mean, while negative z-scores indicate data points below the mean.

  2. Magnitude: The magnitude of the z-score indicates how many standard deviations the data point is from the mean. A larger absolute z-score implies that the data point is farther from the mean in terms of standard deviations.

  3. Comparison: Z-scores allow you to compare data points from different distributions. By standardizing data, you can assess how unusual or typical a particular data point is within its distribution.

  4. Outliers: Data points with z-scores significantly higher or lower than a certain threshold (usually around ±2 or ±3) are often considered outliers, as they deviate substantially from the mean.

  5. Normalization: Z-scores standardize data, making it easier to analyze and compare data with different units and scales.

Z-scores and our Data

Recall that there are many columns whose labels are pre-pended with "Z". These column values are the z-scores of the participant with respect to that trait. For example if the subject has a z-score of 0 for label ders_total, which recall refers to DERS total score (Difficulties in Emotion Regulation Scale), then that person would have had the average/mean value for that particular psychometric.

## Gathering Column Names
#column_names_tuple = tuple(insomnia_data.columns)

#print(column_names_tuple)
#print(len(column_names_tuple))

A Look at Some Correlations for All Races/Ethnicities

target_column = 'ISI_total';

# Calculate correlations with the chosen column

correlations = insomnia_data.corr(numeric_only=True)[target_column].sort_values(ascending=False);

pd.set_option('display.max_rows', 50);
print(correlations.head(10));
ISI_total        1.000000
ZISI_total       1.000000
ZPSQI_total      0.708189
PSQI_total       0.708189
SubGroup         0.692995
ZGCTI_total      0.619535
GCTI_total       0.619535
GCTI_anxiety     0.617297
ZGCTI_anxiety    0.617297
Group            0.615157
Name: ISI_total, dtype: float64
correlations_df = pd.DataFrame(correlations)

# Reset the index and add a column for variable names
correlations_df.reset_index(inplace=True)
correlations_df.columns = ['Variable', 'Correlation']

# Print the DataFrame with multiple columns
#print(correlations_df)
pd.set_option('display.max_columns', None)
df_t = correlations_df.T

display(df_t)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
Variable ISI_total ZISI_total ZPSQI_total PSQI_total SubGroup ZGCTI_total GCTI_total GCTI_anxiety ZGCTI_anxiety Group BDI_total ZBDI_total FIRST_total ZFIRST_total GCTI_reflection ZGCTI_reflection NEO_agreeableness ZNEO_agreeableness GCTI_worries ZGCTI_worries GCTI_negativeAffect ZGCTI_negativeAffect Remote asq_school Zasq_school asq_future Zasq_future ZGCTI_thoughts GCTI_thoughts casq_sleepy Zcasq_sleepy ZTCQIR_worry TCQIR_worry Zasq_leisure asq_leisure casq_alert Zcasq_alert Zasq_attendance asq_attendance asq_teacher Zasq_teacher ZTCQI_R_Total TCQI_R_Total TCQIR_social_avoidance ZTCQIR_social_avoidance TCQIR_behavtioral_distraction ZTCQIR_behavtioral_distraction NotHispanic ZTCQIR_reappraisal TCQIR_reappraisal asq_peer Zasq_peer asq_romantic Zasq_romantic NEO_neuroticism ZNEO_neuroticism cope_disengage_su Zcope_socialsupp_instr cope_socialsupp_instr Zcope_planning cope_planning asq_home Zasq_home Zcope_emotions cope_emotions Zders_goals ders_goals PSRS_RWO ZPSRS_RWO NEO_Conscientiousness ZNEO_Conscientiousness cope_active Zcope_active Zcope_disengage_su ASHS_substances Zcope_acccept cope_acccept PSRS_RSE ZPSRS_RSE ZNEO_openness NEO_openness cope_socialsupp_emo Zcope_socialsupp_emo ders_impulse cope_disengage_mental Zcope_disengage_mental cope_humor Zcope_humor Zcope_religion casq_total Zcasq_total Zders_impulse ZASHS_substances ASHS_bedtimeRoutine ZASHS_bedtimeRoutine ZNEO_extraversion NEO_extraversion Zasq_responsibility American_Indian Asian White ZTCQIR_Aggressive_supression TCQIR_Aggressive_supression Native_Hawaiian cope_denial Zcope_denial ders_nonaccpetance Zders_nonaccpetance cope_growth Zcope_growth Zders_strategies ders_strategies Zcope_disengage_emo cope_disengage_emo cope_suppression Zcope_suppression asq_finance PSRS_total ZPSRS_total cope_restraint Zcope_restraint Zasq_finance Black ASHS_BedroomSharing ZDERS_total ders_total asq_responsibility MEQr_total ZMEQr_total PSRS_RSC ZPSRS_RSC TCQIR_cognitive_distraction ZTCQIR_cognitive_distraction ACE_tot ZASHS_BedroomSharing PSRS_PrR ZPSRS_PrR ZACE_tot STAI_Y_total ZSTAI_Y_total Zders_clarity ders_clarity PSS_total ZPSS_total Sex ZASHS_SleepEnvirnmont ASHS_SleepEnvirnmont PSRS_FRa ZPSRS_FRa PDS_FEMALE ZASHS_sleepStability ASHS_sleepStability ders_awareness Zders_awareness ASHS_DaytimeSleep ZASHS_DaytimeSleep Hispanic Age PDS_MALE ASHS_emotional ZASHS_emotional ASHS_total ZASHS_total ZASHS_cognitive ASHS_cognitive unknown_Race unknown_Etnicity Unnamed: 95
Correlation 1.0 1.0 0.708189 0.708189 0.692995 0.619535 0.619535 0.617297 0.617297 0.615157 0.596929 0.596929 0.523398 0.523398 0.509471 0.509471 0.465616 0.465616 0.459035 0.459035 0.458402 0.458402 0.453848 0.407773 0.407773 0.38315 0.38315 0.352051 0.352051 0.316141 0.316141 0.30236 0.30236 0.301567 0.301567 0.292942 0.292942 0.286191 0.286191 0.28059 0.28059 0.275811 0.275811 0.222331 0.222331 0.220245 0.220245 0.21373 0.19261 0.19261 0.182992 0.182992 0.181919 0.181919 0.169643 0.169643 0.168497 0.165903 0.165903 0.164836 0.164836 0.162166 0.162166 0.155069 0.155069 0.125583 0.125583 0.120577 0.120577 0.116151 0.116151 0.108338 0.108338 0.104242 0.103794 0.100861 0.100861 0.093433 0.093433 0.092483 0.092483 0.09132 0.09132 0.085062 0.07301 0.07301 0.065751 0.065751 0.062734 0.062725 0.062725 0.058432 0.050481 0.040162 0.040162 0.039893 0.039893 0.037892 0.032293 0.029444 0.028738 0.026792 0.026792 0.022713 0.019374 0.019374 0.017677 0.017677 0.017521 0.017521 0.012895 0.012895 0.008314 0.008314 0.006285 0.006285 0.002789 -0.00443 -0.00443 -0.013747 -0.013747 -0.03327 -0.040749 -0.050997 -0.063225 -0.063225 -0.065147 -0.069157 -0.069157 -0.072064 -0.072064 -0.075808 -0.075808 -0.079921 -0.082231 -0.085338 -0.085338 -0.088105 -0.102072 -0.102072 -0.106103 -0.106103 -0.11328 -0.11328 -0.11725 -0.125915 -0.125915 -0.16801 -0.16801 -0.18813 -0.223113 -0.223113 -0.224974 -0.224974 -0.227422 -0.227422 -0.235089 -0.291307 -0.339504 -0.374654 -0.37492 -0.416037 -0.416075 -0.452816 -0.452829 NaN NaN NaN

Highest Positive Correlations with ISI_total Identified for Total Population

We see that these are:

  1. ZPSQI_total:Zscore for PSQI_total,PSQI total (Pittsburgh sleep quality index )
  2. PSQI_total:PSQI_total,PSQI total (Pittsburgh sleep quality index )
  3. SubGroup (the other group)
  4. ZGCTI_total
  5. GCTI_total
  6. GCTI_anxiety
  7. ZGCTI_anxiety

    4-7 all being related to anxiety tell us that the it's reasonable to guess that anxiety is the number one cause.

As we can see (trivially) the ISI_total column and ZISIT_total column give a perfect correlation. More interestingly is that PSQI

Finding Negative or Lesser Correlations

As I pondered the data I wondered if any of these other factors that seems associated to "coping mechanisms" or other factors that might help alleviate insomnia or general mood correlate negatively. If they do then one might recommend those to patients.

correlations = insomnia_data.corr(numeric_only=True)[target_column].sort_values(ascending=True)

display(correlations)
ASHS_cognitive     -0.452829
ZASHS_cognitive    -0.452816
ZASHS_total        -0.416075
ASHS_total         -0.416037
ZASHS_emotional    -0.374920
                      ...   
ZISI_total          1.000000
ISI_total           1.000000
unknown_Race             NaN
unknown_Etnicity         NaN
Unnamed: 95              NaN
Name: ISI_total, Length: 168, dtype: float64

Unsurprisingly we see members of the ASH metrics showing up. What's interesting is the relative (perhaps) importance of them in helping alleviate insomnia.

It is the cognitive "Adolescent Sleep Hygiene Scale" that correlate most negatively with insomnia.

Visualizing Correlations

#Let's select some interesting features
features_study1 =['Age','ISI_total','ASHS_total','PSQI_total']

subdf_one_1=insomnia_data[features_study1]
#plt.figure(figsize=(3, 2))  # Width: 8 inches, Height: 6 inches


sns.pairplot(subdf_one_1, height=2, aspect =1)
plt.show()

There seems to be some correlation between Age and ders_awareness. I also see something of a linear correlation perhaps, this time between ASHS_total and ISI_total. What's interesting is that in the latter case it seems to correlate negatively.

Comparing Ethnicities

Let's take a look at the potential correlations between being Hispanic and the various disorders. First we take a look at females (indicated by a 0 in the sex column of the dataframe).

## Starbucks

#width=2
#height=1
columns_hispanic_1 =['Age','ISI_total','ASHS_total','PSQI_total']
#matplotlib.rcParams['figure.figsize'] = [width, height]

#plt.rcParams['figure.dpi'] = 200
# Get only entries where hispanic is set to 1
sns.set(style="whitegrid")


subdf = insomnia_data[insomnia_data['Hispanic'] == 0]


subdf_hisp_1=subdf[columns_hispanic_1]

subdf_hisp_1.head()


sns.pairplot(subdf_hisp_1, height=2, aspect =1)
plt.show()
    

Next let's take a look at males.

columns_hispanic_1 =['Age','ISI_total','ASHS_total','PSQI_total']



subdf= insomnia_data[(insomnia_data['Hispanic']==1) & (insomnia_data['Sex'] ==1)]


subdf_hispanic_1=insomnia_data[columns_hispanic_1]

subdf_hispanic_1.head()
sns.set(style="whitegrid")
sns.set_context("notebook", rc={"figure.figsize": (3, 2)})  # Set the desired width and height


sns.pairplot(subdf_hispanic_1, height=2, aspect=1)
plt.show()

Observations

It looks like for both genders there's something of a correlation between Age and ders_awareness.

That could imply that for either Hispanics or people in general there's some measurable increase in this features as one ages. Let's continue looking at these features in this way but this time for Asian populations.

columns_asian_1 =['Age','ISI_total','ASHS_total','PSQI_total']

subdf = insomnia_data[insomnia_data['Asian'] == 1]



subdf_asian_1=insomnia_data[columns_asian_1]

subdf_hisp_1.head()


sns.pairplot(subdf_asian_1, height=2, aspect=1)
plt.show()

Asian Scores Based on Gender

Males

### Breaking Down With Respect to Gender

#selected_rows = df[(df['Age'] > 25) & (df['Score'] >= 80)]

## Starbucks

columns_asian_1=['Age','ASHS_total','ISI_total','ders_awareness']



subdf= insomnia_data[(insomnia_data['Asian']==1) & (insomnia_data['Sex'] ==1)]


subdf_asian_1=insomnia_data[columns_asian_1]

subdf_hisp_1.head()


sns.pairplot(subdf_asian_1, height=2, aspect=1)
plt.show()

Females

columns_asian_1=['Age','ASHS_total','ISI_total','ders_awareness']



subdf=insomnia_data[(insomnia_data['Asian']==1) & (insomnia_data['Sex'] ==0)]


subdf_asian_1=insomnia_data[columns_asian_1]

subdf_hisp_1.head()


sns.pairplot(subdf_asian_1, height=2, aspect=1)
plt.show()
## Plotting Together:

columns_asian_1=['Sex','Age','ASHS_total','ISI_total','ders_awareness']


subdfAsian= insomnia_data[(insomnia_data['Asian']==1)]


subdfAsian = subdfAsian[columns_asian_1]

print(subdf_one_1['Age'])

sns.scatterplot(data=subdfAsian, x='Age', y='ISI_total', hue='Sex', style='Sex')
plt.title('Scatter Plot with Different Colors and Markers')
plt.xlabel('Age')
plt.ylabel('ISI_total')
plt.legend(title='Category')
plt.show()
0     19.3
1     19.3
2     18.8
3     18.8
4     19.6
      ... 
90    16.9
91    16.6
92    17.3
93    16.8
94    16.7
Name: Age, Length: 95, dtype: float64
correlation_matrix = subdfAsian.corr()
print(correlation_matrix)
                     Sex       Age  ASHS_total  ISI_total  ders_awareness
Sex             1.000000 -0.277602    0.027418  -0.336290        0.106390
Age            -0.277602  1.000000    0.061980  -0.283078        0.362403
ASHS_total      0.027418  0.061980    1.000000  -0.456341       -0.120614
ISI_total      -0.336290 -0.283078   -0.456341   1.000000       -0.109638
ders_awareness  0.106390  0.362403   -0.120614  -0.109638        1.000000

We seem to see something like a normal distribution at the left bottom corner.

Conclusions and Future Directions

Conclusions Conclusions: Our analysis of the Adolescent Insomnia Study dataset has brought to light intriguing correlations that provide a glimpse into the intricate landscape of childhood insomnia. Notably, the strong correlation between anxiety and insomnia severity underscores the interplay between emotional well-being and sleep quality in adolescents. This finding accentuates the need for a holistic approach to addressing insomnia issues, one that encompasses both psychological and physiological factors.

Equally noteworthy are the negative correlations observed between cognitive tools and reduced insomnia severity. This revelation emphasizes the potential efficacy of cognitive interventions in managing insomnia among adolescents. By incorporating cognitive strategies into educational programs or therapeutic approaches, we might empower adolescents with practical tools to enhance their sleep quality and overall well-being.

Future Directions: Our exploration, though illuminating, merely scratches the surface of the rich dataset at hand, which encompasses a comprehensive array of 200 features. This vast dataset invites us to embark on a journey of deeper understanding and multidimensional exploration:

Feature Analysis: The dataset's breadth provides a unique opportunity to conduct an in-depth analysis of other features beyond the ones we explored. By systematically examining each feature's correlation with insomnia severity and other relevant metrics, we can uncover additional factors that might contribute to or mitigate childhood insomnia.

Feature Engineering: Beyond simple correlations, feature engineering techniques can help unveil intricate patterns and interactions within the data. Techniques such as principal component analysis (PCA) or dimensionality reduction can unveil hidden relationships that might not be evident at first glance.

Clustering and Subgroup Analysis: Employing clustering algorithms, we can identify subgroups within the dataset that exhibit distinct sleep patterns, coping mechanisms, and psychometric traits. This could lead to the discovery of novel sleep-related profiles and help tailor interventions to specific groups.

Time Series Analysis: If the dataset includes temporal information, time series analysis can provide insights into the evolution of sleep patterns and psychometric traits over time. This approach is especially valuable for understanding the dynamic nature of childhood insomnia.

Predictive Modeling: Leveraging machine learning, we can develop predictive models that anticipate insomnia severity based on a combination of features. This predictive capability could inform early interventions and personalized strategies.

Ethnic and Demographic Variations: Given the diverse nature of the dataset, exploring how various features interact with ethnicity, socioeconomic status, and other demographic factors could unveil unique insights into how different populations experience childhood insomnia.

In essence, while our current analysis paints a compelling picture, the true potential of this dataset lies in its depth and diversity. By embracing the complexity of this dataset, researchers can unravel the multifaceted nature of childhood insomnia and contribute to informed interventions that cater to diverse needs.

In summary, as we journey forward, let's recognize that the exploration of this dataset is an ongoing endeavor. Its richness holds the promise of unearthing countless insights that could revolutionize our understanding of childhood insomnia and its management. By embracing the challenge and harnessing the power of advanced analytical techniques, we can make substantial strides in improving the sleep quality and well-being of adolescents worldwide.

In Part 2 we will explore:

  1. Linear Regressions and other models for predicting ISI_total
  2. Is there a peak at around 18 years old? Is it true for all age groups?
  3. Which groups of people were most represented and what implications might that have for the study?